Pseudotime Analysis: Tracing Cell Development Over Time¶

This notebook analyzes how cells change and develop over time using pseudotime analysis.

Pseudotime is a computational method that orders cells along a continuous timeline based on their gene expression patterns, even when we only have snapshots of cells at different time points.

What We'll Do:¶

  1. Run two different pseudotime methods:
    • DPT (Diffusion Pseudotime): Uses diffusion maps to trace cell development
    • Palantir: A more sophisticated method that can handle multiple cell fates
  2. Compare the results - See how well both methods agree on cell development timing

Preparing the Data for Analysis¶

Before we can analyze pseudotime, we need to clean and prepare our data:

  1. Normalize: Make sure all cells have similar total gene expression levels
  2. Log transform: Reduce the impact of very highly expressed genes
  3. Find variable genes: Identify genes that change the most between cells (these are most informative)
  4. Scale: Standardize gene expression values
  5. Reduce dimensions: Use PCA to focus on the most important patterns
  6. Build neighborhood graph: Find which cells are similar to each other

Let's first see what the actual time points are and the cell types.

No description has been provided for this image

For pseudotime analysis, we need to define:

  • Start cell: Where development begins (a cell at the beginning of the experiment - 0hr)
  • Terminal states: Where development ends (like differentiated cell types)

In our case:

  • We start with an undifferentiated cell
  • We end with two possible fates: prespore cells and prestalk cells
No description has been provided for this image

Method 1 - DPT (Diffusion Pseudotime)¶

DPT works like this:

  1. Diffusion map: Model each cell as a node in a low‑dimensional diffusion space, where edges (short “bridges”) are weighted by transcriptomic similarity—closely related cells share shorter, stronger connections.

  2. Pseudotime calculation: From a chosen start cell, accumulate diffusion (geodesic) distances along the graph to gauge how far every other cell lies along the developmental manifold.

This gives us a simple timeline of development - early cells have low pseudotime, late cells have high pseudotime.

No description has been provided for this image

Method 2 - Palantir Analysis¶

Palantir is a more sophisticated method that can handle cells developing into multiple different fates (like our prespore and prestalk cells).

Palantir works in several steps:

  1. Diffusion maps: Build a detailed map of cell relationships
  2. Multiscale space: Look at patterns at different scales of resolution
  3. Trajectory calculation: Calculate the probability that each cell will become each final cell type
Sampling and flocking waypoints...
Time for determining waypoints: 0.07117983102798461 minutes
Determining pseudotime...
Shortest path distances using 30-nearest neighbor graph...
Time for shortest paths: 0.6211285988489786 minutes
Iteratively refining the pseudotime...
Correlation at iteration 1: 0.9999
Entropy and branch probabilities...
Markov chain construction...
Computing fundamental matrix and absorption probabilities...
Project results to all cells...

Visualizing Palantir Results¶

Palantir creates several useful plots:

  • Pseudotime: How far along development each cell is
  • Branch probabilities: How likely each cell is to become prespore vs prestalk
  • Entropy: How "decided" each cell is (low entropy = committed to one fate)
No description has been provided for this image

We can identify cells that are clearly committed to specific developmental paths.

No description has been provided for this image

Finally, we can draw arrows showing the predicted developmental paths from our starting cell to the final cell types.

Out[20]:
<Axes: title={'center': 'palantir_pseudotime'}, xlabel='UMAP1', ylabel='UMAP2'>
No description has been provided for this image

Comparing Both Methods¶

Now let's compare how well DPT and Palantir agree with each other and with our known information:

  1. Experimental time: What we actually measured
  2. DPT pseudotime: What the simpler method predicted
  3. Palantir pseudotime: What the more sophisticated method predicted

Both methods give similar results.

No description has been provided for this image

Comparison by Cell Type¶

Let's create violin plots to see how pseudotime values are distributed within each cell type, using the classification of cell types from the previous analysis.

No description has been provided for this image
/var/folders/b0/d34rvffj6l18k2sl5mb16w440000gn/T/ipykernel_98488/3529783223.py:2: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  mean_dpt = adata.obs.groupby("time")["dpt_pseudotime"].mean()
/var/folders/b0/d34rvffj6l18k2sl5mb16w440000gn/T/ipykernel_98488/3529783223.py:3: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  mean_palantir = adata.obs.groupby("time")["palantir_pseudotime"].mean()
No description has been provided for this image
/var/folders/b0/d34rvffj6l18k2sl5mb16w440000gn/T/ipykernel_98488/728978201.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  mean_dpt = adata.obs.groupby("time")["dpt_pseudotime"].mean().sort_index() * 20
No description has been provided for this image
/var/folders/b0/d34rvffj6l18k2sl5mb16w440000gn/T/ipykernel_98488/2349783955.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  mean_palantir = adata.obs.groupby("time")["palantir_pseudotime"].mean().sort_index() * 20
No description has been provided for this image

Analysis of pseudotime using UCE embeddings¶

Now, instead of preprocessed counts, let's use the UCE embeddings.

Sampling and flocking waypoints...
Time for determining waypoints: 0.08182460069656372 minutes
Determining pseudotime...
Shortest path distances using 30-nearest neighbor graph...
Time for shortest paths: 0.3875067989031474 minutes
Iteratively refining the pseudotime...
Correlation at iteration 1: 0.9998
Correlation at iteration 2: 1.0000
Entropy and branch probabilities...
Markov chain construction...
Computing fundamental matrix and absorption probabilities...
Project results to all cells...

Visualizing Palantir Results¶

Palantir creates several useful plots:

  • Pseudotime: How far along development each cell is
  • Branch probabilities: How likely each cell is to become prespore vs prestalk
  • Entropy: How "decided" each cell is (low entropy = committed to one fate)
No description has been provided for this image

We can identify cells that are clearly committed to specific developmental paths.

No description has been provided for this image

Finally, we can draw arrows showing the predicted developmental paths from our starting cell to the final cell types.

Out[36]:
<Axes: title={'center': 'palantir_pseudotime'}, xlabel='UMAP1', ylabel='UMAP2'>
No description has been provided for this image

Comparing Both Methods¶

Now let's compare how well DPT and Palantir agree with each other and with our known information:

  1. Experimental time: What we actually measured
  2. DPT pseudotime: What the simpler method predicted
  3. Palantir pseudotime: What the more sophisticated method predicted
No description has been provided for this image
/var/folders/b0/d34rvffj6l18k2sl5mb16w440000gn/T/ipykernel_98488/3529783223.py:2: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  mean_dpt = adata.obs.groupby("time")["dpt_pseudotime"].mean()
/var/folders/b0/d34rvffj6l18k2sl5mb16w440000gn/T/ipykernel_98488/3529783223.py:3: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  mean_palantir = adata.obs.groupby("time")["palantir_pseudotime"].mean()
No description has been provided for this image
/var/folders/b0/d34rvffj6l18k2sl5mb16w440000gn/T/ipykernel_98488/728978201.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  mean_dpt = adata.obs.groupby("time")["dpt_pseudotime"].mean().sort_index() * 20
No description has been provided for this image
/var/folders/b0/d34rvffj6l18k2sl5mb16w440000gn/T/ipykernel_98488/2349783955.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  mean_palantir = adata.obs.groupby("time")["palantir_pseudotime"].mean().sort_index() * 20
No description has been provided for this image

Comparison by Cell Type¶

Let's create violin plots to see how pseudotime values are distributed within each cell type, using the classification of cell types from the previous analysis.

No description has been provided for this image
/opt/anaconda3/envs/scRNA_env2/lib/python3.12/pty.py:95: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  pid, fd = os.forkpty()